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Abstract 

We investigate the effects of heterogeneous and clustered contact patterns on 
the timescale and final size of infectious disease epidemics. The abundance of 
transitive relationships (the number of 3 cliques) in a network and the variance 
of the degree distribution are shown to have large effects on the number ulti- 
mately infected and how quickly the epidemic propagates. The network model 
is based on a simple generalization of the configuration model, and epidemic 
dynamics are modeled with a low dimensional system of ordinary differential 
equations. Because of the simplicity of this model, we are able to explore a 
large parameter space and characterize dynamics over a wide range of network 
topologies. We find that the interaction between clustering and the degree distri- 
bution is complex, and that clustering always slows down an epidemic, but that 
simultaneously increasing clustering and variance of the degree distribution can 
potentially increase final epidemic size. In contrast to solutions for unclustered 
configuration model networks, we find that bond percolation solutions for the 
final epidemic size are potentially biased if they do not take variable infectious 
periods into account. 
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The degree distribution and the clustering coefficient [T] are two of the most 
commonly investigated features of networks with respect to diffusion and epi- 
demic processes. We consider the problem of the the dynamic spread of an 
infectious disease in a population modeled as a random graph. While the in- 
fiuence of the degree distribution on epidemic dynamics is now a well studied 
problem [51 [21 m [S], there is still substantial debate about the effects of cluster- 
ing. Some investigations have indicated that clustering may decrease epidemic 
thresholds, in effect making it more likely that an epidemic will occur following 
an initial introduction [6 . Other studies have found that the relationships be- 
tween clustering and epidemic thresholds is complex [Zl [HI [9l , and depends on 
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how clustering is introduced into the population. The eflFects of clustering on the 
timescale of an epidemic are less ambiguous; clustering will decrease the rate of 
epidemic propagation. The approach developed here is particularly well suited 
to investigate the interaction between clustering and the degree distribution in 
affecting epidemic timescales. We find that clustering always slows down an 
epidemic, but the timescale of the epidemic is much more sensitive the variance 
of the degree distribution. 

Following the approach laid out in [TOl [11] , the networks we use are straight- 
forward generalizations of the configuration model [12] . These networks allow 
for easy tuning of the degree distribution and the number of 3 cliques in the 
network, which is related to the clustering coefficient. As shown in [Tni[II], and 
more recently in [T3] and [H], these networks are not locally tree-like, but can 
nevertheless be analyzed with the tools of branching processes and percolation 
theory. 

Our epidemic model is a generalization of the approach presented in [15] . 
and consists of a low dimensional system of ordinary differential equations which 
describes the prevalence of infection over time. Recently an alternative system 
of ODEs was independantly developed in [TS] which also describes epidemics 
in networks with arbitrary degree distribution and clustering. Our approach 
complements the one taken in [16] by providing a solution based on a simple 
class of random networks that are easily examined using branching processes and 
percolation theory. Under some circumstances, our model is in close agreement 
with the one presented [H], but around epidemic thresholds, they can differ 
substantially. This comparison suggests that an accurate account of the effects 
of clustering requires consideration of more than just the clustering coefficient. 
Epidemic dynamics are also affected by details of how clustering is introduced 
into the network. 

We also revisit the problem of bond percolation for calculating the giant 
component size in networks with clustering [101 111] . Recently a variation on 
this class of random networks was investigated in [M], and they showed how 
to correctly take the infectious period into account when calculating final epi- 
demic size using bond percolation. In unclustered networks generated by the 
configuration model, bond percolation approximations for final epidemic size 
that neglect variable infectious periods have negligible bias [T7| [T5] . Yet we find 
that in networks with a lot of clustering, bond percolation approximations for 
epidemic size can be very biased if variable infectious periods are not taken into 
account. 

1. Methods 

We consider a basic susceptible-infected-recovered model. Infectious nodes 
transmit to neighbors at a constant rate f3. Infectious nodes transition to the 
recovered state at a constant rate 7. Once recovered, the node cannot be re- 
infected, and can no longer infect neighbors. 

Our solutions will be based on the class of undirected random graphs orig- 
inally described in [101 IHj - which are refinements of bipartite configuration 
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models [121 H] . A node can be a member of a multiple 2 or 3-cliques ( a clique 
is a completely connected subgraph ). A 2 clique is just a pair of nodes with 
an edge between them, and we will call these lines. A 3 clique is three nodes 
with all three possible edges, which we will sometimes call triangles. Each node 
is a member of a random number of lines and triangles. The probability that 
a node is a member of I lines and t triangles is described by the probability 
mass function pi^f Finally, our model will make heavy use of the probability 
generating function (pgf) for this distribution: 

i,t 

When differentiating the pgf, we will use superscripts so that for example g^^^ 
would indicate the first derivative with respect to x and g^^'^'^ would indicate the 
2nd derivative with respect to x. The pgf can be used to calculate many useful 
properties of the graph; for example, the number of half-lines is proportional to 
M :— J pi^t X I — S*-^' (Ij !)• The number of lines in the network is proportional 
to l)/2 (because there are two nodes for every line). And the number of 

triangles in the network is proportional to M := g^y\l^ l)/3 (because there are 
three nodes for every triangle). 

Random graphs of the kind described in [101 EI] can be easily generated by 
assigning a random number of lines and triangles to a set of N nodes from the 
distribution (. After insuring that the mean number of lines and triangles are 
divisible by 2 and 3 respectively, edges can then be created by 

1. generating a set of half-lines or "stubs", such that the number of times a 
node appears in the set is equal to the number of lines to which it belongs, 

2. generating a set of "corners" , such that the number of times a node appears 
in the set is equal to the number of triangles to which it belongs, 

3. recursively constructing an edge between 2 stubs drawn at random and 
without replacement, 

4. and, recursively constructing edges between three corners drawn at ran- 
dom and without replacement. 

This algorithm may produce loops and double-edges, but the frequency of such 
edges will be 0{l/N). 

Our solution is based on the idea that the number of transmissions per unit 
time is a linear function of several time dependent variables: 

1. Msi{t) oc the number of lines that begin at a susceptible node and termi- 
nate at an infectious node, 

2. n2i{t) cx the number of triangles with two susceptible nodes and one in- 
fectious node, 

3. ni2{t) oc the number of triangles with one susceptible and two infectious 
nodes, and 

4. nii(t) oc the number of triangles with one susceptible and one infectious 
node. 
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In particular, we will assume that the number of transmissions per unit time 
over a line and triangle are respectively proportional to 

T^^pMsi, 

Ta =/3(2n2i + 2ni2+nn). 

Following the methods in [HI [20] , we will describe epidemic dynamics with a 
set of ODEs in terms of the M and n variables as well as two survivor functions: 

1. 6*2 (i): the probability that a neighbor in a 2 clique (or "line") has not 
transmitted infection prior to time t, and 

2. 9z{t): the probability that both neighbors in a 3 clique (or "triangle") 
have not transmitted infection prior to time t. 

As in [ini HO] J we conclude that the probability that a node with I' lines and t' 
triangles remains susceptible is 62 0\ . Consequently the fraction of the popula- 
tion, S, that remains susceptible at any time is 

i,t 

The probability that an edge beginning at a susceptible node will terminate 
at an infectious node is M57/M5, where Mg is proportional to the number of 
half-edges or stubs connected to susceptible nodes. Similarly, the probability 
that a susceptible node is connected to a 3 clique with i susceptible nodes and j 
infectious nodes is i x riij/Ms, where Ms is proportional to the number of links 
from a susceptible node to a 3 clique. These two variables are easily expressed 
in terms of the pgf: 

Ms ^Y.^ X pi^tOlel = 025<"H02,^3), 

l,t 

Ms ^Y^^^PiM = o^g^y\e2,e^). 
i,t 

Note that these quantities are less than the actual number by a factor of 1/N, 
where N is the population size. 

The system of ODEs relies on several more variables derived from the gen- 
erating function. When a transmission event occurs, lines and triangles that 
were formally counted among A/55 or 71,21 may instead be counted among Msi 
or ni2. Quantifying the magnitude of these changes requires that we calculate 
the average degree of a newly infected node. This is accomplished with the 
excess degree distribution and its corresponding generating function |21) . We 
will denote as qi^t the probability that there are / lines and t triangles connected 
to a susceptible node that we reach by following a line from an infectious to a 
susceptible node not counting the line by which we arrived. Similarly, r; j will 
be the probability that if we follow a 3 clique to a susceptible node, that there 
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are I lines and t 3 cliques connected to that node, not counting the one by which 
we arrived. Then we have the generating functions 
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(x,2/) = =g(-)(02:z;,^3y)/5^"H02,e3) (1) 



i,t 



i,t 

The mean number of lines and triangles in these joint distributions gives us the 
expected number of lines or triangles of a newly infected node. We denote the 
means as 6ij , which is the average excess number of type-j links for a susceptible 
node selected with probability proportional to the number of type-i links. Using 
the generating functions, we have 

= 025^(1,1), 
5« = 03ff^^)(l,l), (2) 

The hazard of infection along a single edge is proportional to the probability 
that the edge terminates at an infectious node {Msi /Ms) and the transmission 
rate. As in the equations derived in [TS], this implies 

03 = -^3-77-, 
Ms 

02 = -02 — . (3) 

Ms ^ ' 

(4) 

Dynamics of Msi and Mss require careful consideration of how edges are 
rearranged following a transmission event. Mss describes the time derivative 
of the (relative) number of lines between susceptibles. T2 transmissions occur 
per unit time along lines, and the newly infected individual is connected to an 
average of bu lines other than the one by which it was infected. With probability 
Mss I Ms , one of these lines terminates at a susceptible node. Therefore Mss 
will decrease at a rate of 2TidiiMsslMs- Furthermore, T3 transmissions will 
occur via triangles, and the newly infected node will be connected to an expected 
number bti lines. Each of these will also terminate at a susceptible node with 
probability Mss I Ms- Then we conclude 

Mss = -2-^ {T^bu + T^ba) . (5) 
Ms 

Similar reasoning leads to the equation for Msi- The edge rearrangement 
follows a similar patterns as for Mss, but wc must also account for the in- 
crease of Msi when a newly infected node is connected to another susceptible 
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(with probability Mss/Ms), as well as the decrease when the new infection has 
connections to other infecteds (with probability Msi/Ms). Taking this into 
account yields terms of the form (T2(5n + T^5ti)MxY /Mx- And, in addition 
to the edge-rearrangement terms, we must account for changes due to recovery 
{—^Mgi) and direct transmission [—(jMsi). 



Msi = -Msiii + {mi + nsu) - ^) 



(6) 



Finally, the equations for the number of 3 cliques with i susceptible and j 
infectious constituents, riij, is found by considering rearrangements as above, 
as well as flow between classes that arc due to an infectious member of the 3 
clique transmitting to a susceptible member, or recovering. For example, a 3 
clique with one susceptible and two infectious nodes (state (1,2)) will transition 
to the state (0, 3) at the rate 2/3, since there arc two edges between susceptible 
and infecteds in this clique. And it will transition to the state (1,1) at the 
rate 27, since there are two infectious nodes in the clique that can recover. To 
summarize, we find 



Ms 

h2i = -(2/3 + 7)^21 + {T36U + T25it) 

2n2 



/ 3rt3o _ 27121 \ 
V Ms Ms J 



h2o = 7^21 - {TsSu + T25u) ^ , (7) 

Ms 

hi2 = 2/3n2i - (2/3 + 27)ni2 + {T3SU + T26U) - "t^) , 

\Ms Ms J 

flu = 277112 -{13 + 7)7111 + {TaStt + T25it) 



V 



20 nil 



Ms Ms 



An additional differential equation can be solved for the epidemic prevalence 
at any time. 

7 = -5-7/ 

= |ff(^2,e3)-7/ 

= ^25^"H^2, ^3) + es9^''\e2, 9s) - ii (8) 

If an initial fraction e <C 1 of the population is infected at the beginning of 
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the epidemic, we use the initial conditions 



02(0) 

MsiiO) 
MssiO) 
n3o(0) 
n2i(0) 



e 



e 



(1 - 2e)M 
(1 - e)M 
eM, 



eM 



(9) 



and the remaining variables would be zero. 

1.1. Bond percolation 

Bond percolation solutions for the giant component size were independently 
derived in [TOl E] . These solutions were based on the undirected bond perco- 
lation, such that each edge is "occupied" , or transmits infection, with indepan- 
dent probability r. Setting the edge occupation probability to the transmission 
probability per partnership, which yields t = /?/(/? + 7) in the Poisson process 
considered here, the giant component size approximately corresponds to the 
final size of an epidemic under certain conditions [21' (the correspondence is 
exact when the infectious period is constant). Derivations were also provided in 
[TUl [TT] for the probability that an epidemic will occur following a single intro- 
duction in clustered networks, and the threshold transmissibility r for epidemics 
to be possible (i.e. where a giant component forms). But these derivations did 
not account for variable infectious periods in a realistic epidemiological setting. 
It is now understood that the undirected bond percolation giant component 
size is only an approximation for the final epidemic size when the infectious pe- 
riod is not constant, albeit a very good approximation [T51[T7]. And the bond 
percolation solutions for the probability of an epidemic can be very biased. 

In contrast to what is observed in [17 , we find that the bond percolation 
solution for final size is not always a good approximation in networks with 
clustering. RecentUy an alternative percolation technique was developed in 
|14j which correctly accounts for variable infectious periods and can accurately 
calculate final sizes in clustered networks, and the techniques described in [7] 
could also take this into account. 

The solutions in [lOj^T] were based on the calculation of the probability that 
there would be 0, 1 or 2 secondary infections following an initial infection in a 3 
clique. If the infectious period is t and assuming a constant rate of transmission, 
the transmission probability to one neighbor would be r(i) = 1 — e~^*. We will 
denote the mean transmissibility as f = /?/(/? + 7). The bond percolation 
solutions in [TUl E] were based on the idea that each edge is occupied with 
independent probability f, which implies that the probability of having 1 or 2 
secondary infections in a 3 clique is 

• 1 secondary infection: ai :— 2f(l — f)^, 
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• 2 secondary infections: a2 := + 2r^(l — r). 

In fact, these probabilities are functions of the infectious period of the initial 
case in the 3 clique, which is itself an exponentially distributed random variable. 
We can solve for the true probabilities by integrating over the infectious period. 



1 secondary infection: 



ai 



76-^* (2(l-e-''*)e-^*(l-f))dt 



2 secondary infections: 



rOQ 

a-z := 



76--^* ((1 - e-'^*)2 + 2(1 - e-^')e-^'f) dt 



= l + (l-2f)— ^ + 2(l-f)(f-l) 
2p + 7 

This distribution is generally different from the one based on ai and ^2, and the 
expected number of secondary infections is strictly less with variable infectious 
periods. To see this, denote the averages R = 2a2 + ai and R = 2ai + ai, 
and note that only 2nd order terms of r will differ between R and R. We have 
f2 =/32/(/3 + 7)2, and 



(r2) = / ^e-^\\-e-^yAt 
Jo 

_ 2/3^ 

" (/3 + 7)(2/3 + 7) 



(10) 



It is easy to see that (r^) > P. Furthermore, if we collect all terms involving 
in the equation for R, we find a leading factor of — 2f. Consequently, these 
terms will be negative and will have larger magnitude in the expression for R 
than for R, so R < R. 

Below, we consider an approximate bond percolation calculation for final 
size. We propose that the number of secondary infections in each 3 clique to 
which an infected belongs is generate by 1 — ai — q;2 + aix + a2x'^, with the 
probabilities a calculated above that take the infectious period into account. 
With this modification, the giant component size can be calculated as in |10j . 
however it is still only an approximation because it assumes that the number 
of secondary infections is independent in multiple cliques to which an infected 
belongs. In general, because R < R, this solution will underestimate final size, 
while the solution in [TOl E] will overestimate final size. 

1.2. Alternative models 

We compare solutions of the system 3][7 to stochastic simulations in con- 



tinuous time. The simulations are based on the the Gillespie algorithm [22] 
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Random networks are generated as described above. At time t = 0, A number 
J(0) initial infections are selected uniformly at random in the graph. When a 
susceptible is infected, new transmission and recovery events are queued with 
exponentially distributed waiting times. 

We also compare the clustering model to a recently proposed system of ODEs 
based on moment-closure [T5]. This model was developed for networks with a 
given degree distribution generated by G{x) and a clustering coefficient 0. This 
system does not specify a joint distribution for the number of lines and triangles. 
Rather, this system is based on the idea that potential 3 cliques, of which a 
degree k node will have (2), will exist with independent probability (j). This 
system also uses PGFs within a low-dimensional system of ODEs, and proposes 
that S = G{e), with = -ef3[SI]/Ms, where [SI] is the number of half-edges 
from a susceptible node that terminates at an infectious node. Equations for 
[SI] are derived in terms of the number of connected triples, or 2-paths, of nodes 
which pass through a susceptible. This model makes the approximation that 
the number of 2-paths connecting two susceptibles and an infected is a simple 
function of the clustering coefficient (j): 

And the number of 2-paths connecting a susceptible with two infecteds is 
We will subsequently refer to this as the House-Keeling (HK) model. 



2. Results 

Many of our results were generated for the purpose of explicating the interac- 
tion between the variance of the degree distribution and the level of clustering. 
The clustering model is especially well suited for investigating this problem, 
since it remains low dimensional even with many degree classes. Previous re- 
search has elucidated the importance of the variance of the degree distribution 
on the location of epidemic thresholds and the final size [21 13] . In particular, 
this research has shown that in networks with power law degree distributions, 
as the variance of the degree distribution diverges to infinity, so too does the 
reproduction number, and the threshold transmissibility vanishes. 

To simultaneously investigate heterogeneous degree distributions and clus- 
tering, we constructed a degree distribution based on the negative binomial 
(NB) distribution which allows us to hold the mean of the distribution constant 
while interpolating over a wide range of variance bound between the mean of 
the distribution and infinity. The NB distribution with parameters p and r is 
generated by 

5n.(a:;r,p)=(^-^^^) ■ (H) 
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We will modify this distribution so that an expected fraction pt of links are to 
3-cliques and 2-cliques always appear in pairs. Each edge will occur as part of a 
pair, which may form part of a 3 clique (with probability pt), or may simply be 
a pair of edges with two nodes that are not themselves connected. Then given 
a random number k 2-tuples generated by equation [TT] the number of lines and 
triangles is generated by ((1 — pt)x'^ + pty)^ , where y is the dummy variable 
for 3 cliques, and x is the dummy variable for 2 cliques. Using the composition 
property of pgf 's, the degree distribution is generated by 

9{x,y) ^ gnb{{l- Pt)x'^ +Pty)- (12) 

Since lines always appear in pairs, it is easy to keep the mean of the distribution 
constant while tuning the amount of clustering withpj, which can range between 
zero and one. 

Figure [TJshows a comparison of the clustering model to 50 stochastic simula- 
tions on random networks with 5000 nodes and 10 initial infections. The degree 
distribution was generated by equation |12[ with a mean of 2 and a variance of 
3. The fraction of links to 3 cliques was pt = 90%. For comparison, we also plot 
a solution to the clustering model (red line) with pr = 0, so that there is no 
clustering. Clustering has the effect of slowing down the epidemic and reducing 
the final number ultimately infected. And the system of equations [3]-[7] correctly 
predicts the final size, while the trajectory passes through the central mass of 
simulated trajectories; the analytical model approximately corresponds to the 
median time for a stochastic simulation to reach a given prevalence (results not 
shown) . 

We show the effects of clustering on the final size of the epidemic in figure [2j 
The clustering model (black line, equations [3||7|) correctly reproduce the final 
epidemic size observed in simulations (box plots) . The MN percolation solution 
proposed in [Tni E] (red dashed line) is noticeably biased for non-zero clustering, 
although that solution does trend downwards correctly. Over-estimation by the 
MN model is expected, as in the last section we showed that the MN model 
will overestimate the number of secondary infections within a 3 clique when the 
infectious period is not constant. 

To calibrate the HK model with our chosen p^ , we used the univariate gen- 
erating function 

G{x) = gix,x^), 

since there are two edges for every 3-clique. And we set (f> to be the clustering 
coefficient in the network, which as shown in [TT] is the ratio of 3x the 
number of triangles, which we denote Nueita, to the number of 2-paths in the 
network, which we denote N^. We have 

(/) = 3iVA/A^3 

a'^yUi 1) 

The HK clustering model also overestimates final size; this is not unexpected, 
since the HK model is not premised on the introduction of 3 cliques, but is 
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Figure 1: The cumulative number of infections is shown versus time. Fifty stochastic simula- 
tions are compared to the solution of equations |3|7| The degree distribution is generated by 
equation |12[ A'^ = 5000,7(0) = 10, pt = 0.9, /3 = l,and 7 = 1. For comparison, a trajectory 
with pt = is shown in red. 
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Figure 2: A comparison of the final epidemic size versus the level of clustering in the network. 
The degree distribution is generated by equation |12[ The black line corresponds to the solution 
of equations [3]|7] The boxplots illustrate the 90% confidence interval from 50 stochastic 
simulations on networks with 5000 nodes. /3 = 7 = 1, and both the mean and variance of the 
degree distribution is 2. 

rather an approximation in the event that the every potential triangle exists 
with independent probability 0. The lack of alignment of the HK model and 
equations |3j|7] indicates that accurate accounting of the effects of clustering 
requires consideration not only of macroscopic properties like the clustering 
coefficient, but also the details of how clustering is introduced into the network. 
Furthermore, as we show below, the discrepancy between the HK and clustering 
model is greatest when the variance of the degree distribution is low, which is 
the case in Figure [2j the variance is set at its lower bound, and is equal to the 
mean=2. 

A more detailed characterization of the interplay of variance and clustering 
is illustrated in Figure [3] The upper left panel shows the final size predicted 
by the clustering model (equations [3[[7| . As expected, final size decreases with 
increasing clustering (the fraction of links to 3 cliques). And in agreement with 
previous studies, the final size usually decreases with increasing variance. There 
is an exception, however, when the variance is very small, and clustering is high. 
In this region, with variance between 1 and 1.5, we see that final size can actually 
increase with larger variance. 

The remaining panels in Figure [3] show the discrepancy between the clus- 
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tering model and an alternative calculation of final size. These heatmaps were 
calibrated to have the same color scale. The bias increases with clustering in 
all cases. But bias is insubstantial when the variance is large, even if clustering 
is also large. This can partially be explained by noting that the nonlinear rela- 
tionship between pt and the clustering coefficient. Given a constant fraction of 3 
cliques, pt, the number of 3 cliques in the network is 

i.t 

which is clearly constant with respect to the variance of the degree distribution 
(holding the mean constant). But the number of 2 paths is 

k=l+2t 

which clearly increase with the second moment of the distribution (^i^Pkk'^)- 
So, increasing the variance of the distribution (holding the mean constant) has 
the effect of decreasing the ratio of Na to N^. The clustering coefficient, (j) = 
BNa/Ns is more important than the total number of 3 cliques in determining 
epidemic outcomes, and as we increase variance, (j) converges to zero, and the 
clustering model converges to the percolation and HK model solutions. To 
understand why (p rather than A^a is the important quantity for determining 
final size, note that as we increase the variance of the degree distribution, the 
mean excess degree, G"(l)/G"(l), increases. The number of 2 paths through a 
node of degree k is (2) . So if we consider a node with mean excess degree k = 
G"'(l)/G"(l), which is the mean degree of a new infected early in the epidemic, 
the probability that two neighbors of that node are themselves connected is 

Pt ^ PtG'jl) 
k - 1 G"(l) ' 

which will decrease with variance of the degree distribution. 

Figure [4] illustrates the impact of clustering and variance of the degree 
distribution on dynamical aspects of the epidemic. We consider the time to 
peak incidence, defined as tp = argmax(— S'(t)) . Clustering always has the 
effect of slowing down the epidemic and increasing tp. Variance always has the 
effect of speeding up an epidemic and decreasing tp. Of the two, it appears that 
tp is much more elastic with respect to variance than pt. The HK model is in 
close agreement with the clustering model (equations |3||7| , but can differ by as 
much as 10% when pt is large. The percolation methods are not represented in 
this figure, as they are uninformative about the timescale of the epidemic. 

3. Conclusion 

The model presented here is a generalization of the one presented in [15]. 
Several other generalizations of that model have been presented [23], including 
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Figure 3: Upper left: The final epidemic size is shown as a function of clustering (the fraction 
of links that go to 3 cliques) and the variance of the degree distribution. All results are based 
on a solution of equatio ns [3} f7] /? = 7 = 1. The remaining panels show the difference between 
the clustering model l[3||7[l and an alternative solution for the final size. 
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Figure 4: Left: The time to peak epidemic incidence (tp) as a function of the fraction of links 
to 3 cliques, pt, and the variance of the degree distribution. Results are based on a solution to 
equations [SjjT] with /3 = 7 = 1 and a degree distribution generate by equation |12| Right: The 
discrepancy between the tp predicted by the HK model and the clustering model (equations 
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simultaneous network dynamics, such as edge swapping [201 Hi], populations 
with heterogeneous contact rates multiple edge types with distinct transmis- 
sion rates |26j , preferential attachment [26j , and growing networks with natural 
birth and mortality [37]. It is likely that these approaches could be combined 
with the model presented here. For example, it would be straightforwards to 
consider epidemics in clustered networks that also have dynamically rearrang- 
ing ties. Other straightforward generalizations might include larger clique sizes, 
or the inclusion of network motifs other than cliques. This model should be 
generalizable to the more general class of bipartite random network presented 
in [13]. 

Regarding the potential for empirical applications, network samples increas- 
ingly provide the information necessary to parameterize these models. Degree 
distributions and clustering coefRcients are often ascertained in social network 
studies [28l[29]. Epidemiological surveillance data often also provide partnership 
durations and measures of concurrency [^S] 130] • 

The problem of SIR dynamics in clustered networks with arbitrary degree 
distributions was also investigated in a recent manuscript [IB] (the HK model). 
We have compared that model to ours by calibrating the clustering coefficient (j) 
to match the fraction of links to 3 cliques, pt- The models are in close agreement 
when the variance of the degree distribution is high, but substantial differences 
in both the final size and timescale of the epidemic exist when the degree dis- 
tribution is homogeneous and when pt is large. This comparison shows that 
epidemic dynamics depend on more than just the clustering coefficient, but on 
details of how clustering is introduced into the network. While the HK model 
is more parsimonious than the one presented here (it has fewer variables), we 
speculate that our model will be a good alternative for data with well defined 
cliques, such as human populations with household structure [5T1 [H] . 
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